Computer Science and Artificial Intelligence Laboratory Victim Migration: Dynamically Adapting Between Private and Shared CMP Caches
نویسندگان
چکیده
Future CMPs will have more cores and greater onchip cache capacity. The on-chip cache can either be divided into separate private L2 caches for each core, or treated as a large shared L2 cache. Private caches provide low hit latency but low capacity, while shared caches have higher hit latencies but greater capacity. Victim replication was previously introduced as a way of reducing the average hit latency of a shared cache by allowing a processor to make a replica of a primary cache victim in its local slice of the global L2 cache. Although victim replication performs well on multithreaded and single-threaded codes, it performs worse than the private scheme for multiprogrammed workloads where there is little sharing between the different programs running at the same time. In this paper, we propose victim migration, which improves on victim replication by adding an additional set of migration tags on each node which are used to implement an exclusive cache policy for replicas. When a replica has been created on a remote node, it is not also cached on the home node, but only recorded in the migration tags. This frees up space on the home node to store shared global lines or replicas for the local processor. We show that victim migration performs better than private, shared, and victim replication schemes across a range of single threaded, multithreaded, and multiprogrammed workloads, while using less area than a private cache design. Victim migration provides a reduction in average memory access latency of up to 10% over victim replication.
منابع مشابه
Victim Migration: Dynamically Adapting Between Private and Shared CMP Caches
Future CMPs will have more cores and greater onchip cache capacity. The on-chip cache can either be divided into separate private L2 caches for each core, or treated as a large shared L2 cache. Private caches provide low hit latency but low capacity, while shared caches have higher hit latencies but greater capacity. Victim replication was previously introduced as a way of reducing the average ...
متن کاملBalancing Capacity and Latency in CMP Caches
The large working sets of commercial and scientific workloads stress the L2 caches of Chip Multiprocessors (CMPs). Some CMPs use a shared L2 cache, to maximize the onchip cache capacity and minimize misses. Others use private L2 caches, replicating data to limit the delay due to global wires and minimize cache access time. Recent hybrid proposals strive to balance latency and capacity, but use ...
متن کاملBP-NUCA: Cache Pressure-Aware Migration for High-Performance Caching in CMPs
As the momentum behind Chip Multi-Processors (CMPs) continues to grow, Last Level Cache (LLC) management becomes a crucial issue to CMPs because off-chip accesses often involve a big latency. Private cache design is distinguished by smaller local access latency, good performance isolation and easy scalability, thus is becoming an attractive design alternative for LLC of CMPs. This paper propose...
متن کاملWARP: Workload Nature Adaptive Replacement Policy
In the present universal scenario where the dependence on heterogeneous multi-core processors is tremendous, dealing with algorithms focusing on coherence between shared caches is imperative. WARP redesigns the replacement policy in the last level cache. In this policy, the shared (clean) lines and the private exclusive line are given the first two priorities dynamically followed by private mod...
متن کاملUnderstanding the Limits of Capacity Sharing in CMP Private Caches
Chip Multi Processor (CMP) systems present interesting design challenges at the lower levels of the cache hierarchy. Private L2 caches allow easier processor-cache design reuse, thus scaling better than a system with a shared L2 cache, while offering better performance isolation and lower access latency. While some private cache management schemes that utilize space in peer private L2 caches ha...
متن کامل